fix: keep agent snapshots resumable when runtime-only inputs cannot be serialized#11127
fix: keep agent snapshots resumable when runtime-only inputs cannot be serialized#11127shaun0927 wants to merge 2 commits intodeepset-ai:mainfrom
Conversation
…ialized The fallback added for agent snapshot serialization errors preserved the original runtime failure, but it could also replace the entire chat_generator or tool_invoker payload with an empty dict. That made the saved snapshot impossible to resume even when only a runtime-only field such as a streaming callback was non-serializable. This change narrows the fallback behavior: Haystack now retries those component inputs field-by-field and omits only the fields that cannot be serialized, preserving resumable fields like messages, state, and tools. A regression test covers resuming from a tool-invoker snapshot created with a non-serializable runtime callback. Constraint: Must preserve the original deepset-ai#11108 goal of not masking the real runtime error Rejected: Keep saving `{}` and document snapshots as non-resumable | breaks the existing resume contract more than necessary Confidence: high Scope-risk: narrow Reversibility: clean Directive: If agent snapshot fallback behavior changes again, verify both error preservation and snapshot resumability Tested: hatch -e test run pytest test/components/agents/test_agent_breakpoints.py -k 'resume_from_tool_invoker' -q Tested: hatch -e test run pytest test/core/pipeline/test_breakpoint.py -k 'create_agent_snapshot' -q Tested: hatch run fmt-check haystack/core/pipeline/breakpoint.py test/components/agents/test_agent_breakpoints.py Not-tested: Full unit suite and integration suite Related: deepset-ai#11126
|
@shaun0927 is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
I re-validated this fix locally against both current upstream Reproduced on current upstream
|
|
Quick follow-up on the current PR state:
So at this point I do not see any remaining contributor-side blocker on my end. If you want any change in scope or implementation, I’m happy to adjust, but otherwise this should be ready for normal review once the maintainer-side Vercel step is handled. |
There was a problem hiding this comment.
Pull request overview
Improves agent snapshot serialization during breakpoint snapshotting so that snapshots remain resumable when only some runtime-only inputs (e.g., callbacks) can’t be serialized, without masking the original runtime error.
Changes:
- Replaces the
{}fallback for failedchat_generator/tool_invokerinput serialization with a field-by-field retry that omits only the failing fields. - Adds a regression test ensuring non-serializable runtime callbacks are omitted while resumable fields (notably
state) remain. - Adds a release note documenting the resumability fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
haystack/core/pipeline/breakpoint.py |
Introduces _serialize_agent_component_inputs to preserve serializable agent inputs field-by-field when full serialization fails. |
test/components/agents/test_agent_breakpoints.py |
Adds regression test asserting streaming_callback is omitted but snapshot remains resumable. |
releasenotes/notes/fix-agent-snapshot-resume-after-fallback-7fd7ff9a0f8f8b87.yaml |
Release note for snapshot resumability improvement. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not serialized_properties: | ||
| return {} | ||
|
|
||
| return { | ||
| "serialization_schema": {"type": "object", "properties": serialized_properties}, | ||
| "serialized_data": serialized_data, | ||
| } |
There was a problem hiding this comment.
_serialize_agent_component_inputs can still return a bare {} when no fields serialize, but _deserialize_value_with_schema treats {} as an invalid payload and will raise DeserializationError. Consider always returning a structurally valid payload (e.g., empty object schema + empty serialized_data) so snapshots don’t become invalid-by-format even when all fields are omitted; this can matter if the corresponding component inputs are optional for resume (e.g., resuming from a ToolBreakpoint where chat_generator inputs may not be needed).
…puts fail to serialize Address the Copilot review on deepset-ai#11127: when every field of a chat_generator or tool_invoker input fails to serialize, _serialize_agent_component_inputs previously returned a bare `{}`. The downstream `_deserialize_value_with_schema` rejects `{}` with DeserializationError, which would silently re-introduce the exact non-resumable snapshot behavior the fix was meant to prevent (e.g. when resuming from a ToolBreakpoint where the sub-component's inputs are not strictly required). The helper now always returns a structurally valid `{"serialization_schema", "serialized_data"}` pair. When all fields are omitted the payload degrades to the schema-valid empty object (`{"type": "object", "properties": {}}` + `{}`), which deserializes back to `{}` without raising. Existing unit tests in TestCreateAgentSnapshot are updated to assert the new empty-but-valid payload shape, and a new test verifies that the all-fields-fail payload round-trips through `_deserialize_value_with_schema` without raising. The release note is extended to describe the empty-payload edge case. Constraint: Must preserve the original deepset-ai#11108 goal of not masking the real runtime error Constraint: Must keep the narrower field-by-field fallback from the previous commit intact Rejected: Keep returning bare `{}` and document snapshots as non-resumable in this edge case | regresses the snapshot resume contract in the very scenario the previous commit promised to preserve Rejected: Fall back to a different marker payload (e.g. string sentinel) | breaks downstream deserializer's object/properties contract and would require changes outside the fallback helper Confidence: high Scope-risk: narrow Reversibility: clean Directive: If agent snapshot fallback behavior changes again, verify both DeserializationError is not raised on the empty-fields case and that the helper never returns a bare `{}` Tested: hatch -e test run pytest test/core/pipeline/test_breakpoint.py -k 'create_agent_snapshot' -q Tested: hatch -e test run pytest test/components/agents/test_agent_breakpoints.py -k 'non_serializable_runtime_callback' -q Tested: hatch -e test run pytest test/components/agents/test_agent_breakpoints.py -k 'resume_from_tool_invoker and not new_breakpoint' -q Tested: hatch run fmt-check haystack/core/pipeline/breakpoint.py test/core/pipeline/test_breakpoint.py test/components/agents/test_agent_breakpoints.py Tested: hatch run test:types haystack/core/pipeline/breakpoint.py Not-tested: Full unit suite and integration suite Related: deepset-ai#11126
|
Addressed the Copilot review point on ProblemWhen every field of a sub-component input failed to serialize, the helper still returned a bare Fix (commit e896522)
Local verification on this branch head
|
Related Issues
Proposed Changes:
The robustness change in #11108 prevents agent snapshot serialization errors from masking the real runtime error, but it can also replace the entire
chat_generator/tool_invokerpayload with{}when only one runtime-only field is non-serializable (for example a lambdastreaming_callback).That makes the saved snapshot non-resumable even though the essential payload (
messages,state,tools, etc.) is still serializable.This PR keeps the scope narrow:
chat_generatorortool_invokerinputs fails, it retries field-by-fieldHow did you test it?
hatch -e test run pytest test/components/agents/test_agent_breakpoints.py -k 'non_serializable_runtime_callback' -qhatch -e test run pytest test/core/pipeline/test_breakpoint.py -k 'create_agent_snapshot' -qhatch -e test run pytest test/components/agents/test_agent_breakpoints.py -k 'resume_from_tool_invoker and not new_breakpoint' -qhatch run fmt-check haystack/core/pipeline/breakpoint.py test/components/agents/test_agent_breakpoints.pyNotes for the reviewer
I intentionally kept this fix local to agent snapshot serialization instead of introducing a broader “non-resumable snapshot” concept.
The regression test covers a concrete user-visible case where a non-serializable runtime callback causes the fallback path to trigger, but the snapshot should still remain resumable because the essential state is serializable.
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:and added!in case the PR includes breaking changes.